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Abstract 

We study the uncertainty in different two-point correlation function (2PCF) estimators in currently 
available galaxy surveys. This is motivated by the active subject of using the baryon acoustic oscilla- 
tions (BAOs) feature in the correlation function as a tool to constrain cosmological parameters, which 
requires a fine analysis of the statistical significance. 

We discuss how estimators are affected by both the uncertainty in the mean density n and the 
integral constraint y% J y2 £(r)d 3 r = which necessarily causes a bias. We quantify both effects for 
currently available galaxy samples using simulated mock catalogues of the Sloan Digital Sky Survey 
(SDSS) following a lognormal model, with a Lambda-Cold Dark Matter (ACDM) correlation function 
and similar properties as the samples (number density, mean redshift for the ACDM correlation func- 
tion, survey geometry, mass- luminosity bias). Because we need extensive simulations to quantify small 
statistical effects, we cannot use realistic N-body simulations and some physical effects are neglected. 

Our simulations still enable a comparison of the different estimators by looking at their biases 
and variances. We also test the reliability of the BAO detection in the SDSS samples and study the 
compatibility of the data results with our ACDM simulations. 



Introduction 

The correlation function £ is the most popular tool for analyzing the distribution of galaxies [21] . 
Any model, like in particular the standard ACDM, predicts a certain shape for £(r) with a dependence 
on the cosmic parameters. Among the predictions, BAOs should imprint the matter correlation func- 
tion. It is a relic of the sound waves in the early Universe when baryon and photons were coupled in a 
relativistic plasma before recombination which caused the wave propagation to end [4] . It can be seen 
as a small peak in the correlation function at a scale r s corresponding to the comoving distance of the 
sound horizon. 

The detection and localization of BAOs [5] give a confirmation of the cosmological paradigm and 
a tool to constrain cosmological parameters. The detection of BAOs in the Cosmic Microwave Back- 
ground (CMB) provides the scale r s = 153.3Mpc and allows to constrain a combination of the Hubble 
constant H(z) and comoving angular diameter distance Da(z) (see e.g. [12], [7]). Further, using the 
value of il m h 2 , also well constrained by CMB measurements, the BAO scale restricts the preferred 
regions for Q m and h. 

The main difficulty for detecting and analyzing BAOs in large scale structures comes from the 
low statistical significance of the signal. It can only be seen on the widest redshift surveys, and has 
been most significantly detected in samples including Luminous Red Galaxies (LRGs). In addition to 
the statistical uncertainty the signal is affected by observational effects that may not be taken into 
account correctly, such as redshift distortions, scale-dependent mass-luminosity bias in the population 
of galaxies or wrong redshift to distance conversion. 

We will not study these systematic effects; instead we focus on the statistical uncertainty in the 
BAO signal estimation through correlation functions. There are two types of statistical uncertainties. 
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The first one comes from cosmic fluctuations due to limited sample volume, and the other one from 
the finite number of galaxies which do not trace exactly the underlying field (i.e. shot noise). 

There are various estimators of the correlation function. Their bias expresses the difference between 
their expected value and the value of the physical quantity of concern. Estimators are also subject to 
variance. In practice there is no way to evaluate the bias of the estimator if it exists, and it must be 
considered itself as a source of uncertainty, in addition to the estimator's variance. 

Usual criterions to compare statistical estimators involve both the variance and the bias. For exam- 
ple, when measuring the quality in terms of mean-squared error, biased estimators could outperform 
unbiased ones. This is the well-known bias- variance tradeoff that depends on the way we measure the 
quality of estimation. For some cosmological analysis, the presence of a bias could be problematic if 
not taken into account. For example, fitting model correlation functions to the data, taking only into 
account the covariance matrix and not the bias, would lead to a false estimation of confidence intervals 
for the model parameters. 

For our study, we use simulations with ACDM power spectrum on the same volume as the data and 
with the same estimated parameters (density of galaxies, mass-luminosity bias, mean redshift for the 
power spectrum). Our simulations assume a lognormal model (described in section 2.2) for the density 
field as proposed by [2J, which has proven to be valid for a good range of scales. The model used has 
physically motivated features, although it is not entirely realistic. It does not completely take into 
account the systematic effects mentioned above: redshift distortions, scale-dependent mass-luminosity 
bias in the population of galaxies, wrong redshift to distance conversion. 

There have been several studies to compare the different estimators of the correlation function ([26 , 
|18|). Here we perform similar comparisons for current galaxy surveys, focusing on large-scale effects 
and BAO detection. We arrive at similar conclusions as previous studies when ranking estimators in 
terms of performance. Our second goal is more specific, focusing on the bias caused by the integral 
constraint for correlation function estimation. Such a bias is expected for all sizes of survey in a 
fractal Universe [3T] and below the scale of homogeneity in the standard cosmological model. We 
study whether this systematic alters the estimation or can be neglected in current galaxy surveys, in 
particular for BAO study. 

The plan of the paper is as follows. In |1.1| we present the different estimators of the correlation 
function that we consider. We recall some of their properties, in particular their sensitivity to the 
uncertainty in the mean density n in |1.2[ and the bias imposed by the integral constraint in |1.3| In 



2.1 we present the SDSS samples that we want to mimic with our simulations (one LRG sample and 



one main sample), and in |2.2| the lognormal model and our procedure for fitting simulation parameters 
to the data. Finally in [3] we perform the analysis of the uncertainty in the £ estimation. We compare 
the quality of the different estimators in |3.1[ and look at the effect of the integral constraint in the 
simulations in |3.2| In |3.3| we look at the reliability of the BAO detection in the SDSS LRG and main 
samples, and see in |3.4| if the £ estimated on the data is compatible with our lognormal simulations 
with a given ACDM power spectrum. 



1. 2PCF estimators and bias 
1.1. 2PCF estimators 

The two-point correlation function is a second order statistic that describes the clustering of a field 
or a point process. More precisely £(r) measures the excess of probability to find a pair of points in 
two volumes dV\ and (IV2 at distance r compared to a random distribution. 

dP 12 =n 2 [l + £(r)]dV 1 dV 2 (1) 
where n is the expected density of the distribution. 

Computing the correlation function requires to have a 3D map of galaxies. In practice galaxies are 
located using their angular position on the sky and their distance from the observer. The distance 
of the galaxy is obtained indirectly by its redshift, which can be measured with high precision using 
spectroscopy. Assuming a cosmological model, the distance of the galaxy is obtained using the relation 
from redshift to distance (for this cosmological model). 
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There are various estimators of the correlation function, most using random catalogues with iden- 
tical geometry to measure this excess of probability. Let Nd and Nr be the number of galaxies 
respectively in the data and random catalogues. We define DD(r), RR(r) and DR(r) as the number 
of pairs at a distance in [ridr/2] of respectively data-data, random-random and data-random galaxies. 
We also define Ndd, Nrr and Ndr as the total number of corresponding pairs in the (real or random) 
catalog. With the convention of counting pairs only once we have: 

N D r = NrN d (4) 

(5) 

In this paper we will use 4 different estimators, Peebles-Hauser [25], Davis-Peebles [5], Hamilton 
[15] and Landy-Szalay [20], which have the following expressions: 



Nrr DD(r) 
Ndd RR(r) 



* N DR DD{r) 

KDp{r) = — — - 1 

N D d DR(r) 



£,ham(t) 



N D R 2 DD(r)RR(r) 
NddNrr [DR{r)Y 



| w(r) = i | N ^ DD (r) 2 N RR DR(r) 



N DD RR(r) N DR RR(r) 

Estimating £ would be easier knowing the exact number of points in the volume expected from the 
distribution. In practice we can only estimate it with the empirical quantities Nd and Ndd- We show 



in section 1.2 that Hamilton and Landy-Szalay only depend on the second order on this uncertainty 
in the mean density, and thus perform better. Moreover in |20) Landy-Szalay has been proven to be 
nearly of minimal variance for a random distribution (i.e. Poisson with no correlation). 

1.2. Uncertainty in the mean density 

We show the calculations given in 15j in a simple case where the sample is volume-limited (i.e. 
with a constant expected density in the sample), so that the optimal strategy is to weight all galaxies 
equally. The empirical density in the catalogue n is a sum of Dirac functions on the galaxies of the 
catalogue. If n is the expected density then S is the relative fluctuation in the sample: 

5=^ (6) 
n 

We write W the indicator function of the sample volume and (.) the integration on the volume. For 
example (W(x) n(x)) is the integration of the empirical density and thus equals the number of points 
in the sample. We introduce the following quantities (with 5 and \& that have statistical expectations 
ofO): 

(ty(x)a(x)) 

(5(x)w( xW (y))r (g) 

. _ (6(x)5(y)W(x)W(y)) r 

^ [7) (W(x)W(y)) r W 
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where (.) r means a double integration in the volume, restricted to x and y separated by a distance 
in [r ± dr /2] . £ is an unbiased estimator of the real £ but we cannot calculate it since we do not know 
n and <5. 

With short calculations it is possible to express the different estimators with the quantities £, 8 
and ^ [H]: 

Mr) = ^ (10) 

- |(r) - *(r) 2 

^ (r) = [i + * W p (12) 

. -2^(r)+ J 2 

€w(r) ^ (13) 

These formulas explain the superiority of Hamilton and Landy-Szalay estimators, with and 6 
terms at the second order in the numerator. Terms in the denominator are not important since they 
generate a small relative error, whereas terms in the numerator can generate a high relative error 
when their values become non negligible compared to £. For Hamilton and Landy-Szalay estimators, 
the error is dominated by the one of £ and not really affected by VP and 6, which are linked to the 
uncertainty in n. 

With these formula we see that the estimators are biased in the general case. Indeed 5 and ^(r) 
have expected value and £(r) has expected value £(r), but the terms are combined in multiplications 
and divisions. So we do not get the expected value of the left-hand side by replacing each term by its 



expected value in the right-hand side of equations (110]) , (111, (12 1, (13). 



1.3. The integral constraint 

The random catalogue is used to measure an excess of pairs compared to a random distribution. 
Equivalently it can be seen as a tool to calculate volumes. Let V be the domain of the sample, if we 
take the limit Nr —> oo: 

RR(r ) # pairs at distance r' £ [r ± dr/2] 



N RR # pairs 

RR(r) _ 1 
Nr^'oo N rr ~ \V\ 



f(r) d = f Jim = j^- I d 3 x / d 3 y l| y _ x | e[r±dr / 2] (14) 

' v Jv 



To simplify the text we define / and Ipn, I dp, Ih, Ils (I when refering to any estimator) as the 
values of the integration against f(r) for the real correlation and for the different estimators: 





Iph = f(r)£ PH (r) (16) 

Jo 

with r max the maximum distance between 2 points in the volume. 

We will show that there is a constraint on the Peebles-Hauser estimator £pjj(r) imposing the 
following equality, regardless of the real function £(r) that is estimated: 



Iph = (17) 
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For a smooth sample and small separation r, the inner integral in equation ( 14 1 equals for nearly 
all x the volume of the spherical envelope y G V r with |y — x| £ jr_ ± dr/2]. So for small r we get 



f(r) 



\V T \ _ 47rr 2 dr 



\V\ 



\V\ 



and if it was the case for all r the constraint (17) would become: 



£(r)d 3 r = 0. 



But when r becomes non negligible compared to the sample size, f(r) =/= 



(17) is different from (18) and depends on the sample volume and geometry. 



Let us show the relation (17) for the Peebles-Hauser estimator 

Nrr DD(r 



1 



1 



1 



DD{r) - 1 



(18) 

and so the constraint 
(19) 



N DD RR(r) fir) Ndd 

In practice the integration consists in making the sum over all bins r$ of the correlation function 
estimated up to r max : 



l l 



DDiri) - 1 



N. 



fin) Ndd 
— ^D£»(r i )-^/(r i ) = l-l = 



DD 



It is possible to show that the same constraint I = is approximately verified for the other 
estimators. For this we need to simplify DR(r) in the limit Nr — > oo. 



1 . . 1 ^ H 1 random points r\ s.t. \t\ — dj| € [r ± dr/2] 

NdJVr ~ N D 2-^ # random points 



This functions g(r) depends on the point positions in the catalogue. We can make another approx- 
imation if the size of the correlation is small compared to the volume and if there are enough data 
points. Then data points are approximately uniformly distributed in the volume, and we can replace 
the mean on data positions by the mean on the volume: 

9(r) « ~ J rf 3 x (~ J d 3 y l| y _ x | 6[r± «ft-/ 2 ]) = f(r) (21) 

Under this approximation all estimators are equivalent and verify the integral constraint. But the 
last approximation is not as good as for Peebles-Hauser estimator, and the constraint should be less 
tight. 

We see again that the estimators are biased in the general case. The real correlation function need 
not satisfy the constraint, whereas estimators do (approximately) verify it and thus their expected 
value also. 



1.4- Effect of the integral constraint on the bias 

To show how the constraint can affect the correlation function estimation we generated realizations 
of segment Cox process (see [23]). The field consists in segments of length I randomly distributed in 
the volume and points randomly distributed on each segment. The intensity of the point process A is 
equal to the mean length of segments per unit volume Ly times the mean number of points per unit 
length A;. This process is easy to sample and its correlation function is known analytically [30] : 

f i I f nr r < 7 

^ r >-\ for r>l [ZZ) 
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It is always non-negative so the integral constraint forces false negative values for the estimators. 
We considered the process with segment length I — 10 (units here are arbitrary), a mean length by unit 
volume Ly = 0.1 and a mean number of points per unit length A; = 1.8. We calculated the correlation 
function estimators on 2000 cubes of sizes a = 10, 2000 cubes of size a = 20 and 512 cubes of size 
a = 50 . We plot figure [T] the mean value of the estimators on the samples and the empirical a value. 
To exemplify the presence of the bias we show in the insets the empirical a divided by y/~N, with N 
the number of realizations, which gives the uncertainty in the empirical mean. A difference between 
the empirical mean and the real £ much larger than ^= means a bias is present in the estimators. 

We observe that a bias is present for all estimators and for all sizes of cubes. As expected it becomes 
smaller when the sample size increases just like the variance decreases. For Landy-Szalay and Hamilton 
estimators the bias also decreases faster than the estimators' variances. The bias approximately equals 
half of the standard deviation a in a large region for a = 10 and a = 20, whereas it is very small 
compared it for a = 50. Biases are similar for the different estimators for a = 10 and a = 20, although 
Landy-Szalay and Hamilton have smaller variances than Peebles-Hauser and Davis-Peebles. 

The effect of the bias is to force negative values at intermediate scales, so that the weighted sum in 
I approaches 0. Figure [2] shows the weighted estimators f(ri)£(ri) and how the integral cancels for the 
estimators and not for the real £. The effect is clear for a = 20 and a = 10 (not shown because results 
for a = 10 and for a = 20 have similar trends). However for a = 50, the bias comes not entirely from 
the integral constraint as the weighted function takes alternatively negative and positive values. So 
the small bias that is still present could come from other effects (e.g. finite number of random points). 
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Figure 1: Mean and a for the different estimators on N = 2000 Cox realizations for cube size a = 20 (left panel) and 
N = 512 realizations for a = 50 (right panel). We plot the analytic function (black), Peebles-Hauser (purple), Davis- 
Peebles (light blue), Hamilton (green), Landy-Szalay (red). In inset we zoom over the biased region with error bars — p= 
which is approximately the standard deviation of the mean on JV realizations. 
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Figure 2: Weighted estimators /(jj)£(ri) for a = 20 (left panel) and a = 50 (right panel). We plot it for the analytic 
function (black), Peebles-Hauser (purple), Davis-Peebles (light blue), Hamilton (green), Landy-Szalay (red). 

We show table [l] the value of / for the real £ and / for the estimators' means. The constraint is 
nearly satisfied (1 « 0), especially for Peebles-Hauser, even when the real £ does not verify it (J ^> I). 
The weight function / sums to 1 (see equation (14)), so a the difference in J^. /(Vi)£ (r*) and 



Si/( r i)C( r i) an d -0 implies in average a similar difference between £ and £. Negative bias may 
compensate positive bias in the integral, so it can be an underestimation. 

For the Landy-Szalay and Hamilton estimators the constraint gets weaker between a = 20 and 
a = 50. These values of a correspond to values of I for the real £ of approximately 0.01 and 0.001. A 
quantity which is more intuitive than / is the normalized mass variance inside a sample V: 



E[M(V) 2 ] ~E[M(V)Y 



(23) 



E [M(V)Y 

u(V) represents the fluctuation of mass in the sample. It can be shown that I is equal to <J 2 (V) up 
to the shot noise variance (see [H]), which can be usually neglected. Thus we can express conditions 
for the constraint to be weak or negligible in terms of the tr(V) value. The cubic samples with a = 20 
and a = 50 correspond respectively to a(V) rts 0.1 and <r{V) s» 0.03. So the constraint still affects the 
estimation for a 10% homogeneity level and starts to be weak for a 3% homogeneity level. 





I 


a{V) 


IpH 


I DP 


Ih 


Ils 


a =10 


0.059 


0.24 


2.87 x 10~ 5 


9.24 x 10~ 5 


7.16 x 10- 3 


0.0125 


a = 20 


9.6 x 10" a 


0.098 


1.88 x 10" 5 


-5.54 x 10~ 4 


2.23 x 10" 4 


1.47 x 10" a 


a = 50 


7.8 x 10" 4 


0.027 


3.8 x 10-° 


-7.04 x 10" 5 


-1.86 x 10" 5 


1.02 x 10" 4 



Table 1: Values of I and I for different estimators on cube sizes a = 10, a = 20 and a = 50 



2. Samples and simulations 

2.1. SDSS galaxy samples 

We want to test the reliability of the correlation function estimation on current galaxy surveys. 
The largest survey up to date is the SDSS with a final version in Data Release 7 (DR7,[S]). It contains 
a magnitude-limited sample of galaxies (main) and a nearly-volume-limited sample of LRGs. For all 
catalogues we adopt a ACDM cosmological model with Cl^j = 0.27, and = 0.73. 

To create volume-limited samples of the main we use the catalogue available in Mangle's webpag^] 
This catalogue is based on the New York University Value-Added Galaxy Catalog [11]. It contains 
r-band absolute magnitudes (M r ) for each galaxy that are already if-corrected and corrected for 



http://space.mit.edu/~ molly / mangle / 
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evolution at a fiducial redshift of z = 0.1 following [TU]. The if-correction and evolution correction 
are required because galaxies are observed at different redshifts. The if-correction converts galaxy 
spectra from observed to emitted frame [16] . Evolution correction is required to take into account the 
time-evolution of galaxies (and thus their spectra) from their individual observed redshift to a common 
redshift for all galaxies [13]. Comoving distances and absolute magnitudes are given in the Mangle 
cosmology, so we convert them in the cosmology we use (Qm = 0.27, = 0.73). 

We also use a volume-limited sample of LRGs drawn directly from SDSS-DR7. LRGs are early-type 
galaxies selected using different luminosity and colour cuts [5], and extending to higher redshift. We 
compute the if-corrected g-band absolute magnitudes (M g ), and corrected for evolution at a fiducial 
redshift of z = 0.3, following the method described in [5]. 

In both cases, we obtain volume-limited samples by dividing the survey in different galaxy popula- 
tions (according to the absolute magnitude in each case) and then cutting the sample at a minimum 
and a maximum redshift so that the density remains approximately constant. The selected volume- 
limited samples from the main catalogue are similar to those used by [8], while the LRG one is the 
same as used by [2"2"] . 

Finally we restrict the samples to a region of the sky that is nearly complete except for small areas 
masked by bright stars. For this we cut the sample in the survey coordinate system (77, A) with limits 
—31.25 < 77 < 28.75 ° and —54.8 < A < 51.8 ° . Because of this, the samples are smaller and we 
have less statistics for correlation function estimation, but it is simpler for obtaining simulations in the 
same volume. 

We give in Table [2] the magnitude and redshift limits used to construct the four volume-limited 
samples. We also give their total number of galaxies (N g ), volume (V) and mean density (n). 



Name 


Magnitude Limits 


Redshift Limits 


Distance Limits 
(ft-iMpc) 


N a 


V 

(h- 1 Mpc) 3 


n 

(h- 1 Mpc)" 3 


mainl 
main2 
main3 
LRG 


M r < -20 
M r < -21 
M r < -21.5 
-23.544 < M g < -21.644 


0.038 < z < 0.119 
0.059 < z < 0.168 
0.071 < z < 0.205 
0.14 < z < 0.42 


113.04 < d < 347.92 
174.73 < d < 485.88 
209.73 < d < 587.95 
410 < d < 1140 


127223 
67189 
30272 
34347 


22.757 x 10 B 
61.200 x 10 6 
108.565 x 10 6 
790.4 x 10 6 


5.590 x 10~ 3 
1.098 x 10" 3 
2.788 x 10~ 4 
4.345 x 10~ 5 



Table 2: Characteristics of the SDSS samples. Distance limits and volume are given for the particular cosmology 
U M = 0.27, and fl A = 0.73. 

2.2. Simulations 

2.2.1. The lognormal model 

The usual paradigm for the distribution of galaxies n g is the Cox process, i.e. a Poisson process 
with an intensity given by a continuous field p g (x), which itself is a statistical process. Knowing p g (x) 
the number of galaxies in a volume dV around x is a Poisson variable with intensity p g (x)dV. It can 
be verified that the correlation function of the point process is the same as the underlying continuous 
process p g plus a weighted Dirac function ^S due to the discreteness. 

The process p g is linked to the underlying matter density field p m since galaxies form in matter 
over-densities, but is not supposed to be identical. Indeed it has been observed that correlation is higher 
in the galaxy distribution than in the matter field, and also depends on galaxy population. The ratio 
of the two is the square of the mass- luminosity bias b. Note that the term bias here has a different 
meaning than when we speak about the bias of an estimator. The mass-luminosity bias quantifies 
how fluctuations are amplified in the distribution of galaxies, whereas the bias of an estimator is the 
difference between its expected value and the quantity to estimate. 

In general b should depend on the scale but here we simplify and consider it constant: 

Ur) = b 2 U(r). (24) 

This simplified model should be a good first order approximation, specially given that we are 
focusing on the correlation at large scales. This model also takes into account the effect of the peculiar 
velocities of galaxies in the correlation measurement, known as redshift space distortions. In the 
simplest plane- parallel approximation, this effect shows as an extra factor multiplying £ [17 , which in 
our case is absorbed in the value of b. 
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We consider a galaxy field p g following a lognormal model as proposed in [5] . A lognormal field Y 
with an expected value of 1 is obtained from a gaussian field X by: 



Y = e x -^. (25) 

This model has been successfully applied to density field reconstruction in :19], where it enters as 
a prior model for the matter field. The lognormal model is quite simple and has other interesting 
properties (see [5]): 

• It describes well the distribution of galaxies as found by Hubble (1934) and recently in 19] when 
the galaxy field is smoothed on scales between 10 and 30 Mpc 

• The positivity of the field is ensured unlike in a gaussian model 

• Numerous quantities can be calculated as easily as for the gaussian field, e.g. statistics of the 
peaks, genus 

• It is arbitrarily close to a gaussian field at early times where a ~ 

• It is the solution of the equations of evolution of p when supposing that the initial density field 
peculiar velocities are gaussian 

In the simulations we start by generating the underlying gaussian field and obtain the corresponding 
lognormal field Y using equation ( 25 1 . The gaussian random field is simple to generate using random 
Fourier modes k that are gaussian with variances Pc{k) (with Pq the underlying gaussian power 
spectrum) . 

For a given power spectrum for the lognormal field Pln, we have to know the power spectrum 
of the underlying gaussian field Pq. The relationship between the two fields is simple in terms of 
covariances (the covariance of the field Y is equal to its correlation function since K[Y] = 1): 



C G (r) = ln[l + C LN (r)} (26) 

The first step is to compute the covariance of the lognormal field Cln from its power spectrum 
Pln by an inverse Fourier transform in 3 dimensions, i.e. by a Hankel transform in the isotropic case. 
The power spectrum has bins with exponential sizes in k (i.e. the in(fcj)'s are spaced linearly) since it is 
smooth in that space. For doing the Hankel transform with this spacing we use the FFTLog proganFJ 



From the lognormal covariance Cln, we obtain the gaussian covariance Cq using relationship (26). 
Finally the power spectrum of the underlying gaussian field Pq is obtained by a Hankel transform of 
its covariance Cq. 

After we have simulated the gaussian field with power spectrum Pq and obtained the lognormal 



field Y using equation (25), a last step is to adjust the density of the lognormal field, multiplying Y 



by the expected density n. 



2.2.2. Adjusting simulation parameters 

We adopt, as Pln for our simulations, a ACDM power spectrum Pacdm given by the iCosmo 
software with the following cosmological parameters: h — 0.7, fij = 0.045, Qjvf = 0.27, = 

0. 73, wo = — 1-0, n s = 1.0, erg = 0.8. We decided to reproduce the main2 sample, given in section [2~T] 
which is an average main sample, and the LRG sample. We take the power spectrum at the mean 
redshift for each sample, i.e. at redshift z — 0.1 for the main2 sample, and at z = 0.3 for the LRG 
sample. 

The simulations give the continuous field p g on a discrete grid of size 700 by 700 by 700 with a 
physical size of (1200 h~ 1 Mpc) 3 for the main2 sample and (1600 ft -1 Mpc) 3 for the LRG sample, 

1. e. with elementary cells respectively of 1.71 h^ 1 Mpc and 2.29 h~ Y Mpc. We then place in each cell 
a number of galaxies which is a Poisson realization of the cell intensity p g , with each galaxy placed 
at random in the cell (i.e. we assume a constant value of p g in each cell). This will have the effect 



2 http:/ /casa. colorado.edu/ ajsh/FFTLog/ 
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of smoothing the correlation function approximately with the cell size. The cubic volume is much 
larger than the final samples but this is done on purpose, since simulations present implicit periodic 
conditions that create correlations between opposite sides of the cube. We get rid of these correlations 
when cutting the samples far away from the border. 

We choose a mean density of points in the volume that gives on average the same number of points 
as in the SDSS samples. 

A last step is to choose the mass-luminosity bias b between the samples and the ACDM matter 
correlation function. For estimating this factor we fit the ACDM correlation function to the one 
estimated on the data £: 

& 2 CACDM«e (27) 

By this method we find a bias for the main samples (the variation of b is rather small between the 
different main samples) and for the LRG sample compared to ACDM at redshift z = 0.1 and z = 0.3. 
We find respectively b = 1.5 and b — 2.5 (figure [3]) : as usually observed, the bias increases with 
luminosity ([ST], [32]). The bias obtained for the LRG is a bit larger than the one usually found for 
LRG, b 2 (e.g. in [H]). This probably comes from the fact that we selected only brightest galaxies 
of the LRG population. 




Figure 3: Left Panel: Estimation of the mass-luminosity bias b by fitting the correlation function to ACDM correlation. 
3 volume-limited samples mainl, main2, main3 and ACDM correlation at z = O.f with b = f.5. Right Panel: LRG 
sample and ACDM correlation at z = 0.3 with b = 2.5 



3. Uncertainty in estimating £ 

3.1. Bias and variance of the estimators 

We use respectively N = 200 and N — 2000 lognormal simulations for the main2 and LRG samples 
with the procedure described before, and compute the different estimators for each realization. We 
use more simulations for the LRG sample because we want to estimate the covariance matrix of £ in 
this sample (see section [3~4| . 

Each time we use 100 000 random points for computing the estimators (i.e. quantities RR(r) and 
DR(r) introduced section [P] ) . This number is large enough so that the corresponding error is small. 
Each time a different random catalogue is used, so when we take the mean over all realizations for 
the analysis of the bias, the effect of finite number of random points is completely negligible. Yet on 
individual realizations, the fluctuation due to finite number of random points can increase a little bit 
the variance of the estimators. For a given contribution to the variance, the number of required points 
is related to the volume size and geometry, and to the size of the bins for estimating £ (in all our tests 
we took bins of size 10/i _1 Mpc). More precisely the condition is that j^-^RR(r) approximates with a 
given precision ^ J v d 3 x J v d 3 y l| y _ x |e[r±rfr/2] ■ 

We show in figure [4] the estimators' means and standard deviations compared to the theoretical 
ACDM correlation function. For clarity the curves have been translated by ±1 /i _1 Mpc. A bias can 
be seen for the estimation in the main sample, with the mean differing by approximately half of the 
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standard deviation from the true value for r > 90 h 1 Mpc. This is shown clearly in the inset where 
we plot the mean and the uncertainty in the empirical mean on the N simulations, i.e. -^=. On the 
LRG, sample estimators means are nearly indistinguishable from theoretical values. 

This also validates our simulation process which gives an output correlation function fitting very 
well the one in input. There is a small difference at the scale of the BAO (in addition to the bias) that 
we attribute to the smoothing introduced by grid discretization described section 2.2.2 The BAO is 
a local maximum so the function decreases after smoothing. 

Concerning the estimator's variances, they are much smaller on the LRG sample than on the main 
sample, since the volume is bigger and the Poisson fluctuations remain small for a number of galaxies 
N D ss 34000 and bins of size 10/i _1 Mpc. 

We also see that Hamilton and Landy-Szalay estimators are much better than the two others in 
terms of variance. This agrees with previous studies [26], [18] showing a superiority of these estimators 
on different processes. It also agrees with the analysis in [20] considering a Poisson process with no 
correlation. In the latter case, Landy-Szalay and Hamilton estimators have second order variance 
decay in ^ with n the number of data points (i.e. a r^p decay with \V\ the volume size) whereas 
Peebles-Hauser and Davis-Peebles have first order decay in -. 




Figure 4: Left Panel: Different estimators' means and 5oniV = 200 main2 realizations, Peebles-Hauser (purple), Davis- 
Peebles (light blue), Hamilton (green), Landy-Szalay (red) and ACDM at z = 0.1 with b = 1.5 (black). Right Panel: 
Same for the N = 2000 LRG realizations, except ACDM at z = 0.3 with b = 2.5. In insets we zoom at the correlation at 
larger scales. Error bars are divided by \/N which gives the uncertainty in the empirical mean on all simulations. This 
shows the presence of a bias at large scales in the main sample for all estimators. Estimators show a negligible bias in 
the LRG sample. 



3.2. Effect of the integral constraint 

We are interested here in the influence of the constraint studied in section 11.31 The constraint 
is of the form f Q maal f{r)£(r) s» with f(r) « 471 ^ dr for small r. Assuming / °° r 2 £(r) dr is finite, 

the value of J Q max f{r) £(r) vanishes as at large volumes. In usual ACDM models the power 

spectrum verifies P(0) — 0, and thus the correlation function verifies L°° r 2 £(r)dr = 0, which makes 
the constraint even more easy to be satisfied. 

Table [3] gives the value of the constrained integral for the theoretical £acdm and for the measured 
£, respectively I and I. For the main2 sample, / is significantly closer to than I, meaning that the 
constraint has an effect on the estimation. The effect is negligible for the LRG sample. 

The value of I gives approximately the bias of £ caused by the integral constraint: it is of order 
10 -3 for the main2 sample and of order 10~ 4 for the LRG sample. Comparing to the values of £ at 
the scales of interest (i.e. usually between 50 and 150/i _1 Mpc), the bias is significant for the main2 
sample but it is negligible for the LRG sample. 

We can also make a parallel with the Cox model of section 1.3 where the effect of the constraint 
becomes very small for a(V) ~ 0.03. The main2 sample has the same value a(V) ~ 0.03 but the effect 
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is still important. In the LRG sample the value is 3 times smaller, a(V) ~ 0.01, so it is not surprising 
that the effect is negligible. 





I 


a(V) 


IpH 


I DP 


Ih 


Ils 


main2 


6.97 x 10~ 4 


« 0.03 


7.52 x 10" 6 


-1.20 x 10~ 4 


-9.70 x 10~ 5 


4.96 x 10" 5 


LRG 


1.68 x 10" 4 


« 0.01 


-8.04 x 10~ 5 


9.14 x 10" 6 


1.12 x 10" 4 


1.25 x 10" 4 



Table 3: Values of I and I for different estimators 



3.3. Reliability of the BAO detection 

We use here the Landy-Szalay estimator since we verified it has the lowest variance (with the 
Hamilton estimator which is nearly equivalent), and like the other estimators has a small negative bias 
for the main2 sample. 

With the N — 200 main2 simulations and the N = 2000 LRG simulations we look for the de- 
tectability of the BAO in the correlation function, under the form of a bump at about 105 hr 1 Mpc. 
The situations are different for the main2 (mass-luminosity bias b = 1.5) and LRG (b = 2.5) samples. 
The main2 presents a lower signal than LRG; and also a larger variance of the estimator due to its 
smaller volume. 

A simple possibility to detect BAOs is to look for a local maximum significantly above in the 
measured £ for a range of scale around the expected BAO scale, e.g. between 80 and 120 h^ 1 Mpc. So 
a simple condition to detect BAOs in most realizations is that the estimator's mean is at more than 
lcr from the level. 

This detectability condition is verified for the LRG sample but not for the main2 sample. On figure 
[H] where we plot for different realizations, we see a positive BAO peak in the majority of the LRG 
realizations but less frequently in main2 realizations, where £(r) is often negative at the peak position. 




Figure 5: Left Panel: £ls f° r 6 lognormal realizations, mean and lcr on 200 Main2 realizations (red), and §acdm a * 
z = 0.1 with b = 1.5 (black). Right Panel: for 6 lognormal realizations, mean and lcr on 2000 LRG realizations 

(red), and £acdm at z = 0.3 with b = 2.5 (black) 

3.4- Compatibility to the data 

We finally compare £ls of the data samples given in table [2] with the simulations (figure rc5J), 
keeping in mind that our simulations are not entirely realistic and neglect some systematic effects. 
For estimating £ on the SDSS data we took into account the exact survey mask (the angular region 
observed) when generating random catalogues with the Mangle softwar^j We explained section |l.l| 
the role of random catalogues in the estimation of £ that are constructed using the same geometry 
as the data catalogue. In our study we restricted the data catalogue to the continuous sky region 



3 http://space.mit.edu /~ molly / mangle / 
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—31.25 < rj < 28.75 ° and —54.8 < A < 51.8 ° , where the SDSS mask is nearly uniform except for 
small holes caused by bright stars. We found that taking the exact mask into account does not change 
significantly the results. 

For the main samples the estimations are compatible with the simulations. Results of the previous 
section explain why the BAO peak cannot be seen, except on the main3 sample which is the largest 
main sample we constructed. 

For the LRG sample, our results agree with previous studies made on the LRG samples of the 
SDSS DR7, with a less limited angular region and more galaxies ([22], [7]). As in these studies, the 
BAO peak is much wider than expected: £ deviates from the ACDM value by approximately 3er from 
140 to 180/i- x Mpc (figure^. 

The widening of the peak is more present at higher redshift as can be seen be cutting the LRG 
sample in 2 redshift ranges (figure [6]) . This was already found in [T] , where an analysis for possible 
systematic effects in the correlation function estimation is done. The conclusion is that none can 
explain this excess in £. In [7] the sample called DR7-Bright is similar to the one used here and also 
present an unlikely fit to a particular ACDM model. 

To quantify the significance of the deviation we follow partly the analysis in [7] and perform a % 2 
test on the correlation function of the whole LRG sample in the range 50 to 200 hr 1 Mpc and in the 
range 50 to 400 h^ 1 Mpc. We introduce a new bias parameter free, (3, and we first minimize over /3 
the quantity: 



X 2 (/3) = Y\ [i(n) - ^LN(n)]C-M(r 3 ) - ^LN(r 3 )} (28) 



ho 



with Ni, the number of bins of the correlation function in the range that we consider. 

For a given /?, this is proportional to the log-likelihood assuming a gaussian model for £ with mean 
£(/3) = (3(;ln and covariance matrix (Cij) between bins. In practice (Cjj) is estimated with the 
N = 2000 lognormal realizations and then inverted. With bins of sizes 10 h" 1 Mpc, the covariance 
matrices are computed respectively on Nb = 15 bins and Nb = 35 bins for the analysis in the range 
50 to 200 h" 1 Mpc and in the range 50 to 400 hr 1 Mpc. This gives respectively 105 and 595 free 
parameters in the covariance matrix. The number of simulations [N = 2000) is much greater than 
this number of parameters in the first case, and also quite larger in the second case, which means the 
empirical covariance matrix should give a good estimate of the true covariance matrix (see e.g. [2"7]). 

With the gaussian hypothesis, x 2 (/3 m m) follows a \ 2 l aw with Nb — 1 degrees of freedom. We stress 
that this is only true because of the special way that /3 intervenes in the fitting form £(/3) of equation 



(28 1, and would not be true for any parameter 9 intervening in a fitting form £(9). 

With this procedure we find a value \ 2 — 30.57 for 14 degrees of freedom in the range 50 to 200 
h^ 1 Mpc, and we find a value \ 2 — 60.86 for 34 degrees of freedom in the range 50 to 400 hr 1 Mpc. 
These 2 values correspond to p-values of respectively 6.3 x 10 -3 and 3.1 x 10 -3 . Another way to 
obtain p-values without the gaussian hypothesis is to perform the same procedure on the lognormal 
realizations. For each lognormal realization, we obtain a value x 2 (/3 T mn), where /3 m i n is calculated 
each time to minimize x 2 (/3). Among the TV = 2000 realizations we obtain 16 realizations that have 
higher values of x 2 (Pmin) for the range 50 to 200 h^ 1 Mpc and 9 realizations for the range 50 to 400 
hr 1 Mpc, i.e. we obtain p-values of respectively 8 x 10~ 3 and 4.5 x 10~ 4 . Thus we find an unlikely fit 
to the particular ACDM model used here. 

An explanation could be that lognormal simulations do not capture correctly the variance and 
covariance of the real galaxy distribution. It could also be due to the systematics of the analysis: 
absence of scale-dependent mass-luminosity bias in the simulations, possibly wrong redshift to distance 
conversion in the data. Also, with different cosmological parameters in the ACDM correlation function, 
results would have been different, and possibly the deviation less significant. 
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Figure 6: Left panel: data for different volume-limited samples, mainl (purple), main2 (light blue), main3 (green), 
mean and la on 200 lognormal realizations of main2 sample (red) and §acdm at z = 0.1 with b = 1.5 (black). Right 
Panel: data £ LS for LRG volume-limited sample with 0.14 < 2 < 0.29 (light blue), 0.29 < z < 0.42 (purple) and the 
whole sample 0.14 < 2 < 0.42 (green), mean and Icr on 2000 lognormal realizations (red) and £acdm at 2 = 0.3 with 
b = 2.5 (black). 



Conclusion 

Wc have studied uncertainties in correlation function estimators with two different goals: comparing 
the different estimators on current galaxy surveys (in particular at large scales for BAO study), and 
study the bias created by the integral constraint. 

We simulated lognormal mock galaxy catalogues; the different parameters of the simulations were 
adjusted to those of the SDSS samples: mean redshift of the ACDM input power spectrum, density of 
galaxies in the sample, mass-luminosity bias. Using enough realizations, we quantified the uncertainty 
in £ coming from both estimators' variances and biases. 

We first compared the different estimators, in particular regarding their sensitivity to the fluctuation 
in the number of galaxies n (i.e. the uncertainty in the mean density): Peebles-Hauser and Davis- 
Peebles depend at first order on that fluctuation; whereas Hamilton and Landy-Szalay have a second 
order dependence. As a consequence, the variances of the first two estimators have only a first order 
decay in the volume size, whereas the two latter estimators have a second order decay. We confirmed 
with the simulations that Hamilton and Landy-Szalay have much smaller variances. 

Then we evaluated the effect of the integral constraint in our simulations: it can affect the estimation 
for small volumes, but it becomes negligible when the real £ itself is close to to verify the constraint. 
For the Cox process the effect becomes very small when fluctuations in the volume are less than 3% 
(cr(V) < 0.03). This homogeneity level is achieved for one of the main galaxy sample. Yet for this 
sample the integral constraint still affects the estimation, with a bias in £(r) of approximately 0.5 a for 
r > 90 ft, _1 Mpc. For the LRG sample, with a(V) « 0.01, estimators are unbiased, thus the integral 
constraint is not affecting the BAO study. 

Finally we were able to determine the reliability of the BAO detection using the estimated correla- 
tion function: it is reliable for the LRG sample but not on the main samples. This confirms detections 
of the BAO signal on the LRG sample we considered and in other studies of £ on the LRG of SDSS- 
DR7. However there remains a large deviation between £ estimated on the data and our model £acdm- 
It consists in a 3<r deviation from 140 to 180 /i -1 Mpc, which leads to an unlikely fit to our ACDM 
model. The reason for this deviation has not been identified clearly; it could come from systematic 
effects not taken into account or variance underestimation in the simulations. 



Acknowledgements 

This research was supported by the European Research Council grant ERC-228261 and by the 
Spanish CONSOLIDER project AYA2006-14056 (including FEDER contributions). PAM acknowl- 



14 



edges support from the Spanish Ministerio de Education through a FPU contract. 

We acknowledge the use of the Sloan Digital Sky Survey data (http://www.sdss.org) and of the 
NYU Value-Added Galaxy Catalog (http://sdss.physics.nyu.edu/vagc/). 

Funding for the SDSS and SDSS-II has been provided by the Alfred P. Sloan Foundation, the Par- 
ticipating Institutions, the National Science Foundation, the U.S. Department of Energy, the National 
Aeronautics and Space Administration, the Japanese Monbukagakusho, the Max Planck Society, and 
the Higher Education Funding Council for England. The SDSS Web Site is http://www.sdss.org/. 

The SDSS is managed by the Astrophysical Research Consortium for the Participating Institutions. 
The Participating Institutions are the American Museum of Natural History, Astrophysical Institute 
Potsdam, University of Basel, University of Cambridge, Case Western Reserve University, University 
of Chicago, Drexel University, Fermilab, the Institute for Advanced Study, the Japan Participation 
Group, Johns Hopkins University, the Joint Institute for Nuclear Astrophysics, the Kavli Institute for 
Particle Astrophysics and Cosmology, the Korean Scientist Group, the Chinese Academy of Sciences 
(LAMOST), Los Alamos National Laboratory, the Max-Planck-Institute for Astronomy (MPIA), the 
Max-Planck-Institute for Astrophysics (MPA), New Mexico State University, Ohio State University, 
University of Pittsburgh, University of Portsmouth, Princeton University, the United States Naval 
Observatory, and the University of Washington. 



References 

[1] A. Cabre and E. Gaztanaga. Clustering of Luminous Red Galaxies - I. Large-scale rcdshift-space 
distortions. Monthly Notices of the Royal Astronomical Society, 393:1183-1208, 2009. 

[2] P. Coles and B. Jones. A lognormal model for the cosmological mass distribution. Monthly Notices 
of the Royal Astronomical Society, 248:1-13, 1991. 

[3] M. Davis and P. J. E. Peebles. A survey of galaxy redshifts. V - The two-point position and 
velocity correlations. Astrophysical Journal, 267:465-482, 1983. 

[4] D. J. Eisenstein and W. Hu. Baryonic Features in the Matter Transfer Function. Astrophysical 
Journal, 496(2):605, 1998. 

[5] D. J. Eisenstein et al. Spectroscopic Target Selection for the Sloan Digital Sky Survey: The 
Luminous Red Galaxy Sample. Astrophysical Journal, 122:2267-2280, 2001. 

[6] D. J. Eisenstein ct al. Detection of the Baryon Acoustic Peak in the Large-Scale Correlation 
Function of SDSS Luminous Red Galaxies. Astrophysical Journal, 633:560-574, 2005. 

[7] E. A. Kazin et al. The Baryonic Acoustic feature and large-scale clustering in the Sloan Digital 
Sky Survey Luminous Red Galaxy Sample. Astrophysical Journal, 710(2):1444, 2010. 

[8] I. Zehavi et al. The Luminosity and Color Dependence of the Galaxy Correlation Function. 
Astrophysical Journal, 630:1-27, 2005. 

[9] K. N. Abazajian et al. The Seventh Data Release of the Sloan Digital Sky Survey. Astrophysical 
Journal Supplement Series, 182:543-558, 2009. 

[10] M. R. Blanton et al. The Galaxy Luminosity Function and Luminosity Density at Redshift z = 
0.1. Astrophysical Journal, 592:819-838, 2003. 

[11] M. R. Blanton et al. New York University Value-Added Galaxy Catalog: A Galaxy Catalog Based 
on New Public Surveys. Astronomical Journal, 129:2562-2578, June 2005. 

[12] W. J. Percival et al. Baryon acoustic oscillations in the Sloan Digital Sky Survey Data Release 7 
galaxy sample. Monthly Notices of the Royal Astronomical Society, 401:2148-2168, 2010. 

[13] M. Fioc and B. Rocca-Volmerange. PEGASE: a UV to NIR spectral evolution model of galaxies. 
Application to the calibration of bright galaxy counts. Astronomy and Astrophysics, 326:950-962, 
1997. 



15 



A. Gabriclli, M. Joyce, and F. S. Labini. Glass-like universe: Real-space correlation properties of 
standard cosmological models. Phys. Rev. D, 65(8):083523, 2002. 

A. J. S. Hamilton. Toward better ways to measure the galaxy correlation function. Astrophysical 
Journal, 417:19, 1993. 

D. W. Hogg, I. K. Baldry, M. R. Blanton, and D. J. Eisenstein. The K correction, astro- 
ph/0210394, 2002. 

N. Kaiser. Clustering in real space and in rcdshift space. Monthly Notices of the Royal Astronom- 
ical Society, 227:1-21, July 1987. 

M. Kerscher, I. Szapudi, and A. S. Szalay. A Comparison of Estimators for the Two-Point 
Correlation Function. Astrophysical Journal, 535:L13-L16, 2000. 

F.-S. Kitaura, J. Jasche, and R. B. Metcalf. Recovering the non-linear density field from the 
galaxy distribution with a poisson-lognormal filter. Monthly Notices of the Royal Astronomical 
Society, 403:589-604, 2010. 

S. D. Landy and A. S. Szalay. Bias and variance of angular correlation functions. Astrophysical 
Journal, 412:64-71, 1993. 

C. Li, G. Kauffmann, Y. P. Jing, S. D. M. White, G. Boerner, and F. Z. Cheng. The dependence of 
clustering on galaxy properties. Monthly Notices of the Royal Astronomical Society, 368(l):21-36, 
2006. 

V. J. Martinez, P. Arnalte-Mur, E. Saar, P. de la Cruz, M. J Pons-Bordcria, S. Paredes, 
A. Fernandez- Soto, and E. Tempel. Reliability of the Detection of the Baryon Acoustic Peak. 
Astrophysical Journal, 696:L93-L97, 2009. 

V. J. Martinez and E. Saar. Statistics of the Galaxy Distribution. Chapman & Hall/CRC, 2002. 

P. J. E. Peebles. The Large-Scale Structure of the Universe. Princeton University Press, 1980. 

P. J. E. Peebles and M. G. Hauser. Statistical Analysis of Catalogs of Extragalactic Objects. III. 
The Shane- Wirtanen and Zwicky Catalogs. Astrophysical Journal Supplement Series, 28:19 — h, 
1974. 

M. Pons-Bordcria, V. J. Martinez, D. Stoyan, H. Stoyan, and E. Saar. Comparing estimators of 
the galaxy correlation function. Astrophysical Journal, 523:480-491, 1999. 

A. Pope and I. Szapudi. Shrinkage Estimation of the Power Spectrum Covariance Matrix. Monthly 
Notices of the Royal Astronomical Society, 389:766-774, 2008. 

A. Refregier, A. Amara, T. Kitching, and A. Rassat. iCosmo: an Interactive Cosmology Package. 
arXiv:0810.1285, 2008. 

U. Sawangwit, T. Shanks, F. B. Abdalla, R. D. Cannon, S. M. Croom, A. C. Edge, N. P. Ross, 
and D. A. Wake. Angular correlation function of 1.5 million LRGs: clustering evolution and a 
search for BAO. arXiv:0912.0511, 2009. 

D. Stoyan, W. S. Kendall, and J. Mcckc. Stochastic geometry and its applications. John Wiley & 
Sons, Chichester, 1995. 

F. Sylos Labini, M. Montuori, and L. Pietronero. Scalc-invariance of galaxy clustering. Physics 
Reports, 293:61-226, 1998. 

I. Zchavi, D. J. Eisenstein, R. C. Nichol, M. R. Blanton, D. W. Hogg, J. Brinkmann, Jon Loveday, 
A. Meiksin, D. P. Schneider, and M. Tegmark. The Intermediate-Scale Clustering of Luminous 
Red Galaxies. Astrophysical Journal, 621(1):22, 2005. 



16 



